Learning Rates: Evolution versus Temporal Difference Learning

نویسنده

  • Simon M. Lucas
چکیده

Evidently, any learning algorithm can only learn on the basis of the information given to it. This paper presents an initial attempt to place an upper bound on the information rates attainable with standard co-evolution and with TDL. The upper bound for TDL is shown to be much higher than for evolution. To test how well these bounds correlate with actual learning, a simple two-player game called treasure hunt is devised. Initial results show that the rank order of learning efficiency can be predicted by the information rate upper bounds.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy versus EVO-rummy

Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield ...

متن کامل

Control of Multivariable Systems Based on Emotional Temporal Difference Learning Controller

One of the most important issues that we face in controlling delayed systems and non-minimum phase systems is to fulfill objective orientations simultaneously and in the best way possible. In this paper proposing a new method, an objective orientation is presented for controlling multi-objective systems. The principles of this method is based an emotional temporal difference learning, and has a...

متن کامل

Temporal-Difference Learning in Self-Play Training

Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield ...

متن کامل

Word clustering effect on vocabulary learning of EFL learners: A case of semantic versus phonological clustering

The aim of this study is to determine the effect of word clustering method on vocabulary learning of Iranian EFL learners through a case of semantic versus phonological clustering. To this effect, 80 homogeneous students from four intermediate classes at an English institute in Torbat e Heydariyeh participated in this research. They were assigned to four groups according to semantic versus phon...

متن کامل

Learning to control forest fires with ESP

Reinforcement Learning (Kaelbling et al., 1996) can be used to learn to control an agent by letting it interact with its environment. In general there are two kinds of reinforcement learning; (1) Value-function based reinforcement learning, which are based on the use of heuristic dynamic programming algorithms such as temporal difference learning (Sutton, 1988) and Q-learning (Watkins, 1989), a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008